liiiniiPiiiiiiiiiiniii 

US006671705B1 

(12) United States Patent (lo) Patent No.: us 6,671,705 Bi 

Duprey et al. (45) Date of Patent: T^RiecrSfl^^OS^ 



(54) REMOTE MIRRORING SYSTEM, DEVICE, 
AND METHOD 

(75) Inventors: Dennis Duprey, Raleigh, NC (US); 

Jeffrey Lucovsky, Gary, NC (US); 
Guillermo Roa, Raleigh, NC (US) 

(73) Assignee: EMC Corparation, Hopkinton, MA 
(US) 

( * ) Notice: Subject to any disclaimer, the term of this 
patent is extended or adjusted under 35 
U-S.C 154(b) by 0 days, 

(21) Appl. No,: 09/375,860 
\(22) Filed: Aug. 17, 1999 



(56) 



U.S. CI. ./y(y7/20^^^1/^ 711/162 

Field of Search .....^.^.^-irrrr::!^^ 202, 
707/203, 204; 711/162 

References Cited 
U.S, PATENT DOCUMENTS 



5,544,347 A 8/1996 Yanal et al 711/162 

5,546,536 A * 8/1996 Davis et al 714/20 

5,758,355 A * 5/1998 Buchanan 707/201 

5,799,323 A 8/1998 Mosher, Jr. et al 707/202 

5,835,953 A ♦ 11/1998 Ohran 711/162 

5,924,096 A * 7/1999 Draper ct al 707/10 

6,065,018 A * 5/2000 Beier et al 707/202 



6,173,377 Bl * 1/2001 Yanai et al 711/162 

6,192,460 Bl ♦ 2/2001 Goleman et al 712A 

6,205,449 Bl ♦ 3/2001 Rastogi et al 707/202 

6,260,125 Bl • 7/2001 McDowell 711/162 

FOREIGN PATENT DOCUMENTS 

EP 0 405 859 A2 1/1991 

* cited by examiner 

Primary Examiner— John Breene 
Assistant Examiner — Khanh Pham 

(74) Attorneyj Agent, or Firm — Bromberg & Simstein LLP 
(57) ABSTRACT 

In ajcemote mirroring system, device, and method, a m agiler 
sto rgge un it stores information in a log and uses the infor - 
mation from the log to qmddy rcsypcbropize slave images 
f ollowing a failure in the master storage unit. Upon receiv - 
i ng a write requestTrom a host, the master storage u nit sto res 
a write entry in the log. The write entry includesmformatio p 
that identifies a portion of the slave images that may be 
imsycfaronxzed trom tne master miage du e to the wri te 
'request. 1 he master storage unit men p roceeds to update th e 
miBier image and me slave im ages."11ie l o g is preserve d 
t hrouga the tailur e, su ch that the log is available to 'tt ie^ 
rt f^eTstoragc unTt ^ uponrecoverv from the failur e. Whe n 
the^masier sioragc unit is operatioriai following the failur e, 
th e master storage unit resynchronizes t he slave images to 
t he master image by copying those portions of the master 
image indicated in the log to the slave images^. 

U Claims, 7 Drawing Sheets 
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REMOTE MIEmORING SYSTEM, DEVICE, 
AND METHOD 

CROSS-REFERENCE TO RELATED 

APPUCAnONS 5 

The following commonly-owned United States patent 
applications may be related to the subject patent application, 
and are hereby incorporated by reference in their entireties: 
Apphcation Sen No. 09/375,331 entitled A COMPUTER 
ARCHITECTURE UTIUZING LAYERED DEVICE 
DRIVERS, filed in the names of David Zeryck, Dave 
Harvey, and Jeffrey Lucovsky on even date herewith; 
and 

Application Ser. No. 09/376,173 entitled SYSTEM, 
DEVICE, AND METHOD FOR INTERPROCESSOR 
COMMUNICAnON IN A COMPUTER SYSTEM, 
filed in the names of Alan L. Taylor, Jeffrey Lucovsky, 
and Karl Owen on even date herewith. 

HELD OF THE INVENTION 20 

T he present invention relates generally to computer stor - 
a ge systems, and more particularly to remote mirroring 
dist^uted computer storage systems. 

BACKGROUND OF THE INVENTION 25 

In a common computer system architecture, a host com- 
puter is coupled to a computer storage system that provides 
non-volatile storage for the host computer. The computer 
storage system includes, among other things, a number of 
interconnected storage units. Each storage unit includes a 
number of physical or logical storage media (for example, a 
disk array). For convenience, a group of one or more 
physical disks that are logically connected to form a single 
virtual disk is referred to hereinafter as a "Logical Unit" 
(LU). Data from the host computed is stored in the computer 
storage system, and specifically in the various storage units 
within the computer storage system. 

One problem in a computer storage system is data loss or 
unavailability, for example, caused by maintenance, repair, 4q 
or outright failure of one or more storage units. In order to 
prevent such data loss or unavailability, a copy of the host 
data is often stored in midtiple storage units that are operated 
at physically separate storage units. For convenience, the 
practice of storing multiple copies of the host data in 45 
physically separate storage units is referred to as "remote 
mirroring." Remote mirroring permits the host data to be 
readily retrieved from one of the storage units when the host 
data at another storage unit is unavailable or destroyed. 

Therefore, in order to reduce the possibility of data loss or 50 
imavailability in a computer storage system, a "remote 
mirror" (or simply a "mirror") is estabhshed to manage 
multiple images. Each image consists of one or more LUs, 
which are referred to hereinafter collectively as a "LU Array 
Set." It shotild be noted that the computer storage system 55 
may maintain multiple mirrors simultaneously, where each 
mirror manages a different set of images. 

Within a particular mirror, one image is designated as a 
master image, while each other image within the mirror is 
designated as a slave image. For convenience, the storage 60 
unit that maintains the master image is referred to herein- 
after as the "master storage imit," while a storage unit that 
maintains a slave image is referred to hereinafter as a "slave 
storage unit." It should be noted that a storage unit that 
supports multiple mirrors may operate as the master storage 65 
unit for one mirror and the slave storage unit for another 
mirror 
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In order for a mirror to provide data availability such that 
the host data can be readily retrieved from one of the slave 
storage units when the host data at the master storage imit is 
unavailable or destroyed, it is imperative that all of the slave 
images be synchronized with the master image such that all 
of the slave images contain the same information as the 
master image. Synchronization of the slave images is coor- 
dinated by the master storage unit. 

Under normal operating conditions, the host writes host 
data to the master storage unit, which stores the host data in 
the master image and also coordinates all data storage 
operations for writing a copy of the host data to each slave 
storage imit in the mirror and verifying that each slave 
storage unit receives and stores the host data in its slave 
image. The data storage operations for writing the copy of 
the host data to each slave storage unit in the mirror can be 
handled in either a synchronous manner or an asynchronous 
manner In synchronous remote mirroring, the master st or- 
age unit ensures that the ho st data ha s been successfu lly 
written to all slave storage_units_jn tne mirror before sendi ng 

an aclfnnwlpHgmpnt tn lhft-hn<:tj-whiVh resultsJn«r&l ativel v 

hjgh ^afftncyj hut y^n sures that all slaves are updated before 
infnnning the host, that thcwrite operation isjcomplete.l n 
asynchronous remote mirroring, the master storage unit 
sends an acknowledgment message to the host before ensur- 
ing that the host data has been successfully written to all 
slave storage units in the mirror, which results in relatively 
low latency, but does not ensure that aU slaves are updated 
before informing the host that the write operation is com- 
plete. 

In both synchronous and asynchronous remote mirroring, 
it is possible for the master storage unit to fail sometime 
between receiving a write request from the host and updat- 
ing the master image and all of the slave-images. The master 
storage unit may fail, for example, due to an actual hardware 
or software failure in the master storage unit or an unex- 
pected power failure. If the master storage unit was in the 
process of completing one or more write operations at the 
time of the failure, the master storage unit may not have 
updated any image, may have updated the master image but 
no slave image, may have updated the master image and 
some of the slave images, or may have updated the master 
image and all of the slave images for any particular write 
operation. Furthermore, the master storage unit may or may 
not have acknowledged a particular write request prior to the 
failure. 

After the failure, it may not be possible for the master 
storage unit to determine the status of each slave image, and 
specifically whether a particular slave image matches the 
master image. Therefore, the master storage unit typically 
resynchronizes all of the slave images by copying the master 
image block-by-block to each of the slave storage units. This 
synchronizes the slave images to the master image, but does 
not guarantee that a particular write request was completed. 
Unfortunately, copying the entire master image to all slave 
storage units can take a significant amount of time depend- 
ing on the image size, the number of slave storage units, and 
other factors. It is not uncommon for such a rcsynchroni- 
zation to take hours to complete, especially for very large 
images. 

Thus, there is a need for a system, device, and method for 
quickly re synchronizing slave images following a failure. 

SUMMARY OF THE INVENTION 

In accordance with one aspect of the present invention, a 
master storage unit utihzes a write intent log to quickly 
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resynchronize slave images following a failure in the master 
storage unit The write intent log is preserved through the 
failure, such that the write intent log is available to the 
master storage unit upon recovery ftom the failure. The 
write intent log identifies any portions of the slave images 5 
that may be unsynchronized from the master image. The 
master storage unit resynchronizes only those portions of the 
slave images that may be unsynchronized as indicated in the 
write intent log. 

In a preferred embodiment, the write intent log identifies 
any image blocks that may be unsynchronized. In order to 
resynchronize the slave images, the master storage unit 
copies only those image blocks indicated in the write intent 
log from the master image to the slave images. 

By resynchronizing only those portions of the slave 
images that may be unsynchronized, the master storage unit 
is able to resynchronize the slave images in significantly less 
time (perhaps seconds rather than hours) than it would have 
taken to copy the entire master image block-by-block to 
each of the slave storage units. 

20 

BRIEF DESCRIPTION OF THE DRAWINGS 

The foregoing and other objects and advantages of the 
invention will be appreciated more fully from the following 
further description thereof with reference to the accompa- 
nying drawings wherein: 25 

FIG. 1 is a block diagram showing an exemplary com- 
puter storage system in accordance with an embodiment of 
the present invention; 

FIG. 2 is a block diagram showing an exemplary storage 
unit in accordance with an embodiment of the present 
invention; 

FIG. 3 is a block diagram showing a conceptual view of 
the relevant logic blocks of a storage processor in accor- 
dance with an embodiment of the present invention; 

FIG. 4 is a state diagram showing the three primary states 
of a mirror in accordance with an embodiment of the present 
invention; 

FIG. 5 is a state diagram showing the four primary states 
of a remote mirror image in accordance with an embodiment ^ 
of the present invention; 

FIG. 6 is a logic flow diagram showing exemplary logic 
for processing a write request in accordance with an embodi- 
ment of the present invention; 

FIG. 7 is a logic flow diagram showing exemplary logic 45 
for removing unneeded write entries from a write intent log 
in accordance with an embodiment of the present invention; 

FIG. 8 is a logic flow diagram showing exemplary logic 
for automatically storing the write intent log in a non- 
volatile storage upon detecting a failure in accordance with 50 
an embodiment of the present invention; 

FIG. 9 is a logic flow diagram showing exemplary logic 
for. resynchronizing the slave images foUowing a failure in 
accordance with an embodiment of the present invention; 

FIG. lOA is a block diagram showing the state of an 5S 
exemplary write intent log after receiving a number of write 
requests in accordance with an embodiment of the present 
invention; and 

FIG. lOB is a block diagram showing the state of an 
exemplary write intent log at the time of a failure and after 
recovering from the failure in accordance with an embodi- 
ment of the present invention. 

DETAILED DESCRIPTION OF A PREFERRED 
EMBODIMENT 

An embodiment of the present invention enables the 
master storage unit to quickly resynchronize slave images 
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following a failure by only updating those portions of the 
slave images that may be unsynchronized from the cone- 
sponding portions of the master image (i.e., any portions of 
the slave images that may differ from the corresponding 
portions of the master image). Specifically, the master 
storage unit maintains a log (referred to hereinafter as the 
"write intent log*') that identifies any portions of the slave 
images that may be unsynchronized. The write intent log is 
maintained in such a way that it is guaranteed to survive a 
failure, and therefore is available to the master storage unit 
following a failure. When the master storage unit is opera- 
tional following the failure, the master storage unit resyn- 
chronizes the slave images by resynchronizing those por- 
tions of the slave images that may be unsynchronized, 
preferably by copying from the master image to each of the 
slave storage imits those image blocks that may be unsyn- 
chronized as identified in the write intent log. A portion of 
the slave images identified in the write intent log is rcsyn- 
chronized even if the identified portion is in fact synchro- 
nized with the master image in one or more slave images. By 
resynchronizing only those portions of the slave images that 
may have been unsynchronized, the master storage unit is 
able to resynchronize the slave images in significantly less 
time (perhaps seconds rather than hours) than it would have 
taken to copy the entire master image block-by-block to 
each of the slave storage units. 

More specifically, when the master storage unit receives 
a write request from the host, the master storage unit stores 
a write entry in the write intent log. The write entry includes 
information that identifies a portion of the mirror image to 
be affected by the write operation (such as a block ofiEset and 
length) as well as, in the case of asynchronous mirroring, the 
actual data to be written into the mirror image. For 
convenience, the information in the write entry is referred to 
hereinafter as "meta-data" in order to distinguish it from the 
actual data to be written into the mirror image. 

In a preferred embodiment of the present invention, the ' 
master storage tmit maintains the write intent log in a 
high-speed memory (referred to hereinafrer as the "write 
cache") during normal operation of the mirror. This allows 
the master storage imit to quickly add write entries to the 
write intent log. If the master storage unit includes redundant 
storage processors (described in detail below), then each 
storage processor maintains its own write intent log that is 
replicated on the peer storage processor. This allows one 
storage processor to take over for the other storage processor 
when a storage processor (but not the entire master storage 
unit) fails. In case of a complete master storage unit failure, 
the master storage unit includes automatic backup/res to ral 
logic that, among other things, automatically stores the 
entire write intent log in a non-volatile storage (such as a 
disk) upon detecting the failure and automatically restores 
the write intent log from the no n- volatile storage upon 
recovery from the failure. The automatic backup/rcstoral 
logic is extremely robust, with redundant battery backup and 
redundant storage capabilities to ensure that the write intent 
log is recoverable following the failure. 

Once the master storage unit has stored a write entry in the 
write intent log, the master storage unit proceeds to update 
the master image and the slave image(s) based upon the 
write request. Assuming the master storage unit is able to 
successfully update the master image and all of the slave 
images, then the write entry is no longer needed, in which 
case the master storage unit deletes the write entry from the 
write intent log. In a preferred embodiment of the present 
invention, the master storage unit deletes write entries from 
the write intent log using a "lazy" deletion mechanism. 
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Specifically, the master storage unit deletes an unneeded fault- tolerant RAID (redundant array of independent disks) 

write entry firom the write intent log at a time that is storage unit with redundant management and storage capa- 

convenient for the master storage unit, and not necessarily as biUties. Remote mirroring is implemented as an add-on 

soon as the master storage unit determines that the write feature of the storage unit. Remote mirroring is used for 

entry is no longer needed. The master storage unit typically 5 providing disaster recovery, remote backup, and other data 

runs periodic cleanups of the write intent log, where the integrity solutions, specifically by keeping bytc-for-byte 

master storage unit may delete a number of unneeded write copies of an image at multiple geographic locations, 
entries during each cleanup cycle. This deletion scheme is shown in FIG. 2, a preferred storage unit 200 includes 

implementationally more efficient than deleting the an Administrative Interface 201, at least one Host Interface 

unneeded write entries one-at-a-time, although it may allow lo 202, at least a first Storage Processor (SP) 204 and an 

some unneeded write entries to persist in the write intent log optional, second SP 208, a number of disks arranged as a 

for some period of time. Disk Anay 206, and a Network Interface 210. The Admin- 

If a storage processor fails, the write operations corre- istrative Interface 201 is preferably an Ethernet interface 

spending to any write entries in the write intent log may be through which the storage xmit 200 is managed and con- 

at different points of completion. For example, the master 15 trolled. The Host HO interfaces with the storage imil 200 

storage imit may not have updated any image, may have through the Host Interface 202, which preferably emulates a 

updated the master image but no slave image, may have SCSI interface. The Host Interface 202 is coupled to the SP 

updated the master image and some of the slave images, or 204 and to the optional SP 208, such that the Host 110 can 

may have updated the master image and all of the slave communicate with both the SP 204 and the optional SP 208, 

images for any particular write operation. 20 Xhe SP 204 and the optional SP 208 are interconnected 

Assuming that the master storage unit has redundant through an interface 209, which is preferably a FibreChan- . 

storage processors, the write intent log maintained by the n^l interface. The SP 204 and the optional SP 208 are also 

failed storage processor is replicated on the peer storage coupled to the Network Interface 210 via the interface 209, 

processor. Therefore, once the peer storage processor has which enables each SP (204, 208) to communicate with SPs 

taken over for the faHed storage processor, the master ^ ^ other storage units within the computer storage system 
storage unit resynchronizes all of the slave images to the 

master image by updating only those portions of the slave A preferred SP (204, 208) is based upon a multiple 

images identified in the write intent log, preferably by processor hardware platform that runs an operating system, 

copying the corresponding image blocks from the master Both SPs (204, 208) run essentially the same software, 

image to the slave storage units. although the software can differ between the two SPs, for 

If the master storage unit fails, the automatic backup/ example, due to a software upgrade of one but not the other 
restoral logic automatically stores the write intent log in the SP. Therefore, each SP (204, 208) is capable of providing 
non-volatile storage. When this occurs, the write operations full management functions for the storage unit, 
corresponding to any write entries in the write intent log may The SP software requires each LU to be owned and 
be at different points of completion. For example, the master accessed through one and only one SP at a time. This notion 
storage imit may not have updated any image, may have of LU ownership is referred to as "assignment." The SP 
updated the master image but no slave image, may have software allows each LU in a LU Array Set to be "assigned" 
updated the master image and some of the slave images, or to a different SP. During normal operation of the storage 
may have updated the master image and all of the slave ^ unit, both SPs process requests and perform various man- 
images for any particular write operation. agement functions in order to provide redundancy for the 

Once the master storage unit is operational following the storage unit. If one of the SPs fails, the other SP takes over 

failure, the automatic backup/restoral logic automatically management of the LUs for the failed SP. 
restores the write intent log from the non-volatile storage. Remote mirroring can be implemented with different SPs 

The master storage unit then resynchronizes all of the slave 45 managing different LUs in a LU Array Set (or even with both 

images to the master image by updating only those portions SPs sharing access to each LU in the LU Array Set), 

of the slave images identified in the write intent log, pref- However, such an implementation would require substantial 

erably by copying the corresponding image blocks from the inter-SP coordination for storing information in the LU 

master image lo the slave storage units. Array Set. A preferred embodiment of the present invention 

FIG. 1 shows an exemplary computer system 100 in 50 therefore requires that all LUs in a LU Array Set be 
accordance with an embodiment of the present invention. "assigned" to the same SP, thereby eliminating any inter-SP 
The exemplary computer system 100 includes a host 11 coordination for storing information in the LU Array Set. 
coupled to a computer storage system 120. The computer Thus, in a preferred embodiment of the invention, each 
storage system 120 includes a master storage unit 130 and a mirror image is managed by one SP at a time. For 
number of slave storage units 140^ through 140jv. The 55 convenience, the SP that is primarily responsible for man- 
remote mirroring functionality of the present invention aging a particular mirror image is referred to hereinafter as 
requires each storage unit in the computer storage system the "primary" SP, while other SP is referred to hereinafter as 
100 to maintain a commimication link to all of the other the "secondary" SP. For purposes of the following 
storage units in the computer storage system 100, such that discussion, and with reference again to FIG. 2, the SP 204 
each storage unit is capable of addressing all of the other go ^ referred to as the "primary" SP, and the SP 208 will 
storage units in the computer storage system 100. The host be referred to as the "secondary" SP. 
110 is coupled to the master storage unit 130, and accesses FIG. 3 shows a conceptual view of the relevant compo- 
the mirror through the host 110. nents of a SP, such as the primary SP 204 and the secondary 

In a preferred embodiment of the present invention, each SP 208, for operation in the master storage unit 130. As 

of the storage units in the computer storage system, such as 65 shown in FIG. 3, the SP includes, among other things, 

the master storage unit 130 and the slave storage units 140^ remote mirroring logic 302, write cache 304, automatic 

through 140;^ in the computer storage system 120, is a backup/restoral logic 306, and disk management logic 308. 
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The disk management logic 308 provides a range of services 
that permit the various components of the SP, including the 
remote mirroring logic 302 and the automatic backup/ 
restoral logic 306, to access the Disk Array 206 and to 
communicate with other SPs, both within the same storage 5 
unit via the interface 209 and across storage units via the 
interface 209. The remote mirroring logic 302 utilizes ser- 
vices provided by the disk management logic 308 to main- 
tain the master image in the Disk Array 206 and communi- 
cate with the slave storage units for coordinating updates of 
the slave images. The remote mirroring logic 302 is indi- 
rectly coupled to the Host Interface 202, through which the 
remote mirroring logic 302 interfaces with the Host 110. The 
remote mirroring logic 302 maintains the write intent log in 
the write cache 304, which is a local high-speed memory on 
the SP that is replicated on the peer SP (i.e., the write cache 
304 on the SP 204 is replicated on the SP 208, and the write 
cache 304 on the SP 208 is repHcated on the SP 204). The 
automatic badcup/restoral logic 306 automatically stores the 
write cache 304, including the write intent log, in the Disk 
Array 206 upon detecting a failure of the master storage unit 20 
130 and restores the write cache 304 from the Disk Array 
206 when the SP recovers from the failure. In a preferred 
embodiment of the present invention, the remote mirroring 
logic 302 is implemented as a layered device driver that 
intercepts and processes information that is sent by the Host 25 
10, as described in the related patent application entitled A 
COMPUTER ARCHITECTURE UTIUZING LAYERED 
DEVICE DRIVERS, which was incorporated by reference 
above. 

In order to perform remote mirroring, the remote mirror- 3Q 
ing logic 302 requires a certain amount of persistent storage 
in the Disk Array 206. This persistent storage is used by the 
remote mirroring logic 302 to keep track of certain infor- 
mation (described in detail below), such as mirror state 
information, mirror membership information, mirror image 35 
configuration information, and other information oeeded to 
ensure proper operation of the mirror. Because this infor- 
mation is critical to the operation of the storage unit and to 
the computer storage system as a whole, the information 
must be highly available, and therefore redundant copies of 
the information are preferably rnaintained within the Disk 
Array 206 in case of a partial disk array failure. 

As noted above, a LU Array Set is cocriposcd of one or 
more LUs. The ability to treat a group of LUs as a single 
entity simplifies the host administrator's task of managing a 45 
remote mirror for a host volume aggregated from one or 
more LUs. Remote mirroring uses this abstraction to pre- 
serve the ordering of all write requests between logically 
connected LUs when updating slave images. When using 
asynchronous mirroring, this ordering can be very important 50 
for database engines that spread tables and views across 
what it sees as multiple devices for performance and locality 
reasons. 

Each LU Anay Set within a mirror, whether it is com- 
posed of a single LU or multiple LUs, must be of the exact 55 
same physical size. This is because the master storage unit 
docs a block-for-block forwarding of every write request it 
receives from the host system. If each image is constructed 
from a single LU, then each LU must be of the same physical 
size. If each image is constructed from multiple LUs, then eo 
the corresponding LUs between a master and its slaves must 
be the same physical size. For example, if the master image 
is composed of LUs A and B of sizes 8 Gb and 4 Gb, 
respectively, then each slave image must be composed of 
two LUs A' and B' of sizes 8 Gb and 4 Gb, respectively, gs 

While the physical size of a LU Array Set must be 
consistent between images of the mirror, the RAID level of 
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the LUs within each LU Array Set may be different. The 
RAID level of a LU determines a number of LU attributes, 
such as the manner in which information is stored in the LU, 
the amount of time it takes to store the information in the 
LU, and the amoimt of information that can be recovered 
from the LU in case of a LU failure. A preferred storage unit 
supports RAID levels 0, 1, 1/0, 3, and 5, which are weQ- 
known in the art. Among the various RAID levels, RAID 
level 5 provides the highest level of information recovery in 
case of a LU failure, but takes the most time to store the 
information in the LU. RAID level 0 provides the lowest 
level of information recovery in case of a LU failure, but 
takes the least amount of time to store the information in the 
LU. Each LU can be assigned a different RAID level. 

In one embodim ent of the present inv ention , the LU s 
a ssociated with the master unage are conl igurecfTor RAID 
level 5, while the LUs associated w ith the slave image(s) are 
configured for RAID level 0. Using"Onriever S7pnfe 
master image makes the m aster image extremely robust. 
U sing RAID leve l 0 for the slave image(s)~Uows caclTslave 
i mage to be written mto its re spcctive ^lave "sto rage unit 
r elatively quickly, which can reduce latency, particularly in 
synchronous, re mote mirroring . 

' 1 he remote mirroring functionality can be described with 
reference to the operational states of a mirror in conjunction 
with the operational relationships between the master image 
and the slave image(s). 

FIG. 4 is a state diagram showing the operational states of 
a mirror. For convenience, the state diagram shown in FIG. 
4 does not show certain failure transitions and/or failure 
states. As shown in FIG. 4, there are three (3) primary states 
for a mirror, namely INACTIVE (402), ACTIVE (404), and 
ATTENTION (40Q. The primary distinction between the 
three (3) states is the way in which the mirror responds to 
read and write requests from the host. 

The default mirror state is the INACTIVE state (402). In 
the INACTIVE state (402), the host is not permitted to 
access the master image. Thus, the host cannot read from the 
master image or write to. the master image. The mirror 
defaults to the INACTIVE state (402) when the to mirror is 
created, and the mirror must be in the INACTIVE state (402) 
before the mirror can be deleted. 

Wh.en the mirror is in the INACTIVE state (402), the 
administrator can attempt to activate the minor. If the 
administrator attempts to activate the mirror and the mirror 
meets all minimum requirements for normal operation, then 
the mirror transitions into the ACTIVE state (404). 
However, if the administrator attempts to activate the mirror 
but the mirror fails to meet all minimum conditions for 
normal operation, the mirror transitions into the ATTEN- 
TION state (406). 

The normal operating mirror state is the ACTIVE state 
(404). In the ACTIVE state (404), the host is permitted to 
access the master image. Thus, the host can read from the 
master image and write to the master image. If at any time 
the mirror fails to meet all minimum conditions for normal 
operation, the mirror automatically transitions into the 
ATTENTION state (406). The mirror transitions into the 
INACTIVE state (402) under direct administrative control. 

The ATTENTION state (406) indicates that there is a 
problem somewhere within the mirror that is preventing the 
mirror from operating normally. In the ATTENTION state 
(406), the host is not permitted to access the master image. 
Thus, the host cannot read from the master image or write 
to the master image. If at any time the mirror meets all 
minimum conditions for normal operation, the mirror auto- 
matically transitions into the INACTIVE state (402). 
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FIG. 5 is a state diagram showing the operational re la- ACTIVE state (404) and the slave image determines that one 

tionships between the master image and a single slave image or more write updates from the master image have been lost 

from the perspective of the slave image. It should be noted in transit. 

that different slave images may be in different states relative Also from the CONSISTENT state (506), the slave image 
to the master image, and therefore the data contained in 5 transitions into the UNSYNCHRONIZED state (502) if the 

various slave images may differ. As shown in FIG. 5, there mirror is in the ACTIVE state (404) and the write history 

arc four (4) primary states, namely UNSYNCHRONIZED maintained by the master storage unit is corrupted or lost. 

(502), SYNCHRONIZED (504), CONSISTENT (506), and A slave image is considered to be in the SYNC HRONIZ- 

SYNCHRONIZING (508). I^jSj jglgOSJ if it is being expliciUv u'Sa^TFSSTh i 

A sla^e image is considered to be in the UNSY - lO n agstcr image m a manner that is not the direct consequ ence 
CHE QNIZED state (5 02) when no known reiationship' ofa host write to the master image. It should be note aihat 

between the data in the slave image and the data in the t he actual synchronizing operation may requ ire a fulTbyte- 

master image can be readily determined. This is the case, for foF-byte copy ot the master image or onlFthc transmission 

example, when the slave image is first added to the mirror. (o r retransmission) of a series of write requests! 

From the UNSYNCHRONIZED state (502), the slave From the SYNCHRONIZING state (508), the slave image 

image transitions into the SYNCHRONIZING state (508) if transitions to the UNSYNCHRONIZED state (502) if, for 

and when the mirror is in the ACTIVE state (404). This is an any reason, the slave image fails to be synchronized with the 

implicit action taken by the remote mirroring software in the master image. In this case, an attempt may be made to 

slave storage unit. synchronize the slave image, although such synchronization 

Also from the UNSYNCHRONIZED state (502), the may be impossible in certain situations, for example, due to 

slave image may be placed in the SYNCHRONIZED state lost communication to the slave image. 

(504) through administrative action. Specifically, the admin- Also from the SYNCHRONIZING stale (508), the slave 

istrator can explicitly synchronize the slave image with the image transitions to the CONSISTENT state (506) upon 

master image by placing the miaor in the INACTIVE state successful completion of the synchronization operation, 

(402), copying the master image to the slave image or regardless of the method used to synchronize the slave 

otherwise creating the slave image to be identical to the image. 

master image, and explicitly marking the slave image as jt should be noted that the slave synchronization opera- 
being in the SYNCHRONIZED state (504). tions are completed transparenUy to the host. In order to 
A slave image is considered to be in the SYNCHRO - prevent slave synchronization operations from affecting 
NIZED state (504) when the slave image is an exacTbyte- normal access to the mirror by the host, a throttling mecha- 
for-byte duplicate of the master image. TEis implies th at nism is used to limit the number of transactions between the 
th ere-are_jio outstanding write requests from the host that master image and the slave image. 

have not been committed to stable storage on both the ma ster As described above, the host is only permitted to access 

maage and thg filfiye ima^e. the mirror through the master storage unit. Therefore, the 

From the SYNCHRONIZED state (504), the slave image remote mirroring driver prevents certain accesses to LUs 

transitions into the CONSISTENT state (506) when the associated with the mirror, specifically by intercepting cer- 

mirror is in the ACTIVE state (404) and the master image tain requests that are received from higher level drivers. In 

commits a write request into its stable storage. At that point, order to intercept requests, each storage unit maintains a LU 
the slave image is no longer an exact byte-for-byte duplicate ^ List identifying all of the storage unit LUs that are associated 

of the master image, although the slave image is still with the mirror. The remote mirroring driver in each slave 

consistent with the previous state of the master image. storage unit intercepts any read or write request from a 

A t^l { ^ ve ima f ?^ is considered to be in the CONSISTENT higher level driver that is targeted for a LU in the LU List 

state (506) if it is not currently an exact byte-forjjy^ and denies access to the LU, specifically by preventing the 
( hipUcatc ot the master image but is a by te-for-bvte_dupli cate 45 request from being processed by the lower level driver(s). 

ot the m aste r miage at some determinable point in the pa st. Similarly, the remote mirroring driver in the master storage 

In synchronou s remote mirroring, th e slave image can differ unit intercepts any write request from a higher level driver 

from the master image by at most one write request, smce that is targeted for a LU in the LU List in order to perform 

t he master storage unit updates all slave storage units"fo r the appropriate remote mirror functions. However, the 
each write__ceQuest . H owever, in a synchronous remote 50 remote mirroring driver in the master storage unit allows all 

mirroring, the slave image can differ from the master image read requests from higher level drivers to be processed by 

by more than one write request, since the master storage unit the lower level drive r(s). 

updates the slave storage units asynchronously with respect Each storage unit that participates in a mirror maintains a 

t o the write requests. complete copy of a mirror database in its persistent storage. 

From - the CONSISTENT state (506), the slave image 55 As mirror- related information changes, each storage unit 

transitions into the to SYNCHRONIZED state (504) if the updates its mirror database so that all participants have the 

mirror is in the ACTIVE state (404) and both the master same view of the mirror. This update across all mirror 

image and the slave image have committed all write requests members is done in "atomic" fashion (i.e., the update across 

to stable storage (i.e., there are no outstanding write all mirror members is treated as a single operation that must 
requests). This transition is made under the control of the eo be completed by all mirror members). By keeping this 

master image. The slave image may also be placed in the information local to each storage unit, the role of the master 

SYNCHRONIZED state (504) by the administrator. image can be assumed by any image in the mirror as directed 

Also from the CONSISTENT state (506), the slave image by the administrator 

transitions into the SYNCHRONIZING state (508) when The information stored within the mirror database serves 
either (1) the mirror is in the INACTIVE state (402) and the 65 two purposes. The first is to provide persistent storage of 

administrator explicitly forces the slave image into the each mirror's attributes. The second is to assist during 

SYNCHRONIZING state (508), or (2) the mirror is in the failover conditions by maintaining the mirror's state infor- 
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matloQ. The information in the mirror database is modified 
indirectly via administrator operations and/or direcdy via 
operational use of the minor. The minimum amount of 
information required to meet the above purposes is main- 
tained in the mirror database. 5 

Hie information maintained for a particular mirror in the 
mirror database can be categorized as mirror-wide informa- 
tion and image-specific information. 

In a preferred embodiment of the present invention, the 
mirror-wide information includes, among other things, a 
mirror name, a minor state, a fracture log size parameter, a 
mirror extent size parameter, a maximum missing images 
parameter, a minimum images required parameter, a heart- 
beat parameter, a synchronization priority parameter, a write 
policy parameter, and a write backlog size parameter. 

The mirror name is a symbolic name for the mirror. Hie 
mirror name is provided by the administrator when the 
mirror is created. The mirror name is maintained as a text 
string field within the mirror database. 

The mirror state indicates whether the mirror is in the 
INACTIVE state (402), the ACTIVE state (404), or the 
ATTENTION state (406). The mirror state is updated 
dynamically by the remote mirroring software. 

Hie fracture log size parameter specifies the size of each 25 
fracture log in units of mirror extent size (described below). 
The fracture log size parameter determines the number of 
missed updates that can be stored in the fi-acture log. 

The mirror extent size parameter specifies the size, in 
blocks, to be used when a partial or fiiU synchronization is 30 
necessary. 

The maximum missing images parameter sets the maxi- 
mum number of images that are allowed to be missing from 
the mirror while allowing the mirror to remain active. When 
this limit is reached, the mirror cannot be activated if it is in 35 
the INACTIVE state (402), or is placed in the ATTENTION 
state (406) if the mirror is in the ACTIVE state (404). A 
value of zero (0) requires that all slave images be present in 
order for the mirror to be active. A value of negative one (-1) 
is used to disable this feature. 40 

The minimum images required parameter sets the mini- 
mum number of images that must be available before the 
mirror can be activated. Setting this value equal to the total 
number of images in the mirror requires that all images be 
present before the mirror can be activated. A value of 
negative one (-1) is used to disable this feature. 

The heartbeai parameter sets the frequency of a heartbeat 
signal that is used by the master storage unit to determine 
"reachability" of each slave storage unit. 

The synchronization priority parameter sets a priority for 
the mirror relative to any other mirrors maintained by the 
storage unit. When multiple mirrors must be synchronized, 
the synchronization priority parameter is used to schedule 
each mirror's synchronization in order to minimize the 
amount of mirror interconnect bandwidth devoted to syn- 
chronization. 

The write policy parameter specifies whether the mirror is 
synchronous or asynchronous. 

The write backlog size parameter sets the amount, i n so 
b locks, of host writes that can be queued on the master fo r 
subsequent deUyery to the slave(s|). The write backlog.size 
p' arametcr is only used for asynchronous remote mirrormg . 

I n_a prefe rred embodiment of the present invention, th e 
i mage-specinc intormation includes, among other things, an 65 
S P identifier, a LU Array Set identitier. an^imaEC designator , 
a mirror image state, a cookie, a timeout value parameter, a 
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s ynchronization rate parameter, a synchronization p rogress 
i ndicator, and a/ecovery policy paramete r. 

The SP identifier uniquely identifies the primary SP and, 
if available, the secondary SP for the image. 

The LU Array Set identifier identifies the one or more 
constituent LUs for the image. 

The image designator specifies whether the image is a 
master image or a slave image. 

The mirror image state indicates whether the image is in 
the UNSYNCHRONIZED state (502), the SYNCHRO- 
NIZED state (504), the CONSISTENT state (506), or the 
SYNCHRONIZING state (508). 

The cookie is a dynamically updated value that contains 
consistency information that relates the state of the image to 
the state of the mirror. 

The timeout value parameter indicates the amount of time 
that a slave storage unit is permitted to remain unreachable 
before the master storage unit considers it to be UNREACH- 
ABLE (described below) 

The synchronization rate parameter indicates the rate at 
which image synchronizations are done, which is the mecha- 
nism by which synchronizations are throttled. 

The synchronization 'progress indicator is used to main- 
tain the status of a slave image synchronization. This value 
is consulted when an unreachable slave that had been 
undergoing synchronization becomes reachable. 

The recovery policy parameter specifies whether or not 
the image should be automatically resynchrooized when the 
image comes online. 

In a preferred embodiment of the present invention, 
remote mirrors and their corresponding images are managed 
through a set of administrative operations. These adminis- 
trative operations change certain characteristics or behavior 
of the entire mirror Image operations are intended for a 
specific image of the mirror. In particular, some image 
operations are intended for the master image, while other 
operations are intended for a particular slave image. 

Unless otherwise indicated, an administrative operation 
must be sent to the master storage unit, which in turn 
propagates the operation to the appropriate slave storage 
unit(s) as needed, specifically using a Message Passing 
Service (MPS) as described in the related application 
entiUed SYSTEM, DEVICE, AND METHOD FOR INTER- 
PROCESSOR COMMUNICATION IN A COMPUTER 
SYSTEM, which was incorporated by reference above. The 
master storage unit maintains status information for each 
slave storage unit in the mirror, specifically whether or not 
the slave storage unit is REACHABLE or UNREACH- 
ABLE (i.e., whether or not the master storage unit is able to 
communicate with the slave storage unit). If the master 
storage unit attempts to propagate mirror configuration 
information to a slave storage unit and the slave storage unit 
fails to acknowledge receipt of the mirror configuration 
information, then the master storage unit marks the slave 
storage unit as UNREACHABLE and propagates new mir- 
ror configuration information to the remaining slave storage 
units in the mirror 

. The remote mirroring software must be notified of any 
configuration changes that affect the operation of mirrors. 
Such configuration changes are not mirror operations per sc, 
but require notification to the mirroring software in order to 
ensure proper mirror behavior. For example, the remote 
mirroring software in each SP must be notified when an LU 
is reassigned from one SP to the other SP so that the SPs can 
coordinate any mirror-related recovery caused by the tran- 
sition. 



12/31/2003, EAST Version: 1.4.1 



us 6,671 

13 

In order to create a mirror, the administrator first creates 
a LU Array Set on the master storage unit and configures the 
LU Array Set to operate as a master image. The adminis- 
trator then invokes a CREATE MIRROR function in the 
master storage unit, specifying the LU Anay Set and a 5 
mirror name. The CREATE MIRROR function initializes 
mirror configuration information and adds the LU Array Set 
to the LU List maintained by the master storage imit. If the 
LU Array Set does not exist or the LU Array Set is part of 
another mirror, then the CREATE MIRROR function fails to 
create the mirror. However, assuming that the CREATE 
MIRROR function completes successfully, then the mirror 
consists of a single (master) image, and is in the INACTIVE 
sUte (402). 

Once a mirror is created, the administrator can add a slave 
image to the mirror, remove a slave image from the mirror, 
promote a slave image to operate as the master image, 
demote the master image to operate as a slave image, 
synchronize a slave image, fracture a slave image, restore a 
fractiu*ed slave image, activate the mirror, deactivate the 
mirror, or destroy the minor. The administrator can also 
change mirror attributes or retrieve mirror attributes. 

In order to add a slave image to the mirror, the adminis- 
trator first creates a LU Array Set on the slave storage unit 
and configures the LU Array Set to operate as a slave image. 25 
The administrator then instructs the master storage unit to 
add the slave image to the mirror. The master storage unit in 
turn instructs the slave storage unit to add the slave image to 
the mirror. The slave storage unit may reject the request, for 
example, if the slave image is already in the mirror, the LU 30 
Array Set does not exist, or the LU Array Set is part of 
another mirror. However, assuming that the slave storage 
unit adds the slave image to the mirror, then the master 
storage unit updates its mirror configuration information to 
include the slave image, and the master storage imit distrib- 35 
utes the new mirror configuration information to all slave 
storage units. 

It should be noted that the slave image can be added to the 
mirror in either the SYNCHRONIZED state (504) or the 
UNSYNCHRONIZED state (502). Adding the slave image 40 
in the SYNCHRONIZED state (504) avoids any synchro- 
nization operations. Adding the slave image in the UNSYN- 
CHRONIZED state (502) requires synchronization opera- 
tions to synchronize the slave image to the master image. If 
the mirror is in the INACTIVE state (402) when the unsyn- 45 
chronized slave image is added to the mirror, then the slave 
image remains in the UNSYNCHRONIZED state (502). If 
the mirror is in the ACTIVE state (404) when the unsyn - 
chronized slave image is added to the mirror or the mirror is 
subsequently activated as described below, a synchroniza- 50 
tion operation is performed to synchronize the slave image 
to the master image. 

In order to remove a slave image from the mirror, the 
administrator first deactivates the mirror as described below. 
The administrator then instructs the master storage unit to 55 
remove the slave image from the mirror. The administrator 
can request either a graceful removal of the slave image or 
a forced removal of the slave image. If the administrator 
requests a graceful removal of the slave image, then all 
outstanding requests to the slave image are completed before 60 
removing the slave image fi^om the mirror. If the adminis- 
trator requests a forced removal of the slave image, then the 
slave image is removed without completing any outstanding 
requests. In either case, the master storage instructs the slave 
storage unit to remote the slave image from the mirror. After 65 
verifying that the LU Array Set is part of the mirror, the slave 
storage unit removes the LU Array Set from the mirror, and 



,705 Bl 

14 

removes the LU Array Set from the LU List. As a result, the 
remote mirroring driver in the slave storage unit stops 
intercepting requests that are targeted for the LUs in the LU 
Array Set. The master storage unit updates its mirror con- 
figuration information to exclude the slave image, and the 
master storage imit distributes the new mirror configuration 
information to all slave storage units. It should be noted that 
removing the slave image from the mirror does not delete the 
corresponding LUs or the data contained therein. 

In order to promote a slave image to operate as the master 
image, the mirror cannot have a master image, and therefore 
the administrator must first demote an existing master image 
to operate as a slave image as described below. The admin- 
istrator then instructs the slave storage unit to promote itself. 
Before promoting itself to master, the slave storage unit 
verifies that there is no existing master image in the mirror, 
that the slave image is in either the SYNCHRONIZED state 
(504) or CONSISTENT state (506), and that the slave image 
had not previously been removed from the mirror or marked 
UNREACHABLE. Assuming that there is nonexisting mas- 
ter image in the mirror, the slave image is in either the 
SYNCHRONIZED state (504) or CONSISTENT state 
(506), and the slave image had not previously been removed 
from the mirror or marked UNREACHABLE, then the slave 
storage unit promotes itself to operate as the master storage 
unit, in which case the new master storage unit updates its 
mirror configuration information and sends new mirror 
configuration information to the slave storage units in the 
mirror. It should be noted that the administrator can explic- 
itly override the latter two promotion conditions, forcing the 
slave storage, unit to promote itself to master as long as there 
is no existing master image in the mirror. 

In order to demote the master image to operate as a slave 
image, the administrator first deactivates the mirror as 
described below. The administrator then instructs the master 
storage unit to demote itself. Assuming the master storage 
unit is available, then the master storage unit updates its 
mirror configuration information and sends new mirror 
configuration information to the slave storage units in the 
mirror. However, if the master storage unit is unavailable, 
for example, due to failure, the administrator may instruct a 
slave storage unit to demote the master storage unit. Each 
slave storage unit updates its mirror configuration informa- 
tion to indicate that there is 00 master image in the mirror. 

In order to synchronize a slave image, the administrator 
instructs the master storage imit to synchronize the slave 
image. The master storage unit performs a block-by-block 
copy of the master image to the slave image. This can be 
done while the mirror is in the ACTIVE state (404) or in the 
INACTIVE state (402). Any incoming write requests that 
are received by the master storage unit during resynchroni- 
zation of the slave image are forwarded to the slave storage 
unit if and only if the write request is directed to a portion 
of the image that has already been written to tbe slave. A 
throtthng mechanism is used to pace the synchronization 
operation in order to to prevent the synchronization opera- 
tion from overloading the communication finks between 
storage units. 

In order to activate the mirror, the administrator instructs 
the master storage unit to activate the mirror. The master 
storage unit updates its mirror configuration information to 
put the mirror into the ACTIVE state (404), and informs all 
slave storage imits that the mirror is active. Each slave 
storage unit in turn updates its mirror configuration infor- 
mation to put the mirror into the ACTIVE state (404). 

In order to deactivate the mirror, the administrator 
instructs the master storage unit to deactivate the mirror The 
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administrator can request either a graceful deactivation, of mirroring logic 302 is free to remove the corresponding 

the mirror or a forced deactivation of the mirror. If the write entry from the write intent log. 

administrator requests a graceful deactivation of the mirror, pjQ g ^ jogj^. diagram showing exemplary remote 

then all outstanding requests to the slave images are com- mirroring logic for processing a write request. Beginning in 

pleted before deactivating the rnirror. If the admimstrator 5 step 602, and upon receiving the write request, in step 604, 

requests a forced removal of the slave image, then the mirror ^^^^^^ ^^ ^^ ^^^^^^ ^ ^.^^ ^-^^ 

IS deactivated without completing any outstanding requests. -ij- ^j^j -jc- 

T * * -4 J * ■< intent log including meta-data derived from the write 

In cither case, the master storage unit updates its murror , P ^ , c ^ . . r .t. 

configuration infonnation to putlhc mirror into tlic INAC- step 606. In a preferred embodmient of the 

nVE state (402), and removes all LUs associated with the Pj^»j invention, the nieta-data mdndes a block identifier 

mirror from the LU List. As a result, the remote mirroring '° identifying the unage block bemg updated, although the 

driver in the master storage unit stops intercepting write ""eta-data may additionally or alternatively include wri e 

requests that are targeted for the LUs in the LU Array Set. ^f'^*'^ mformaUon mdicaUng one or more modifications to 

nie master storage unit also informs all slave storage units ^^g"" ^'"'"'E the write entry in the wnte mtent 

that the mirror is inactive. Each slave storage unit in turn „ 1°^' »° t P ' '"T mnronng logic proceeds to 

updates its mirror configuration information to put the " i»Pdate the m^^ter image based upon the write request, in 

mirror into the INACTIVE state (402). ^"P '""^ P'°^".''^ '° "Pl^'^ ^'^^^ """S^.^ ^"^^ 

, , . 1 u J • • . . upon the wnte request, m step 610. The remote mirronne 

In order to change mirror attnbutcs, the admimstrator i • ^ • *i. •» ™ * * • ♦ • . 

, , , , ./™ ^ , logicrorprocessmg the wnte request termmates m step 699. 

sends a request to the master storage umt. The master storage 

unit in turn updates its mirror configuration information and ^IG. 7 is a logic flow diagram showmg exemplary remote 

the mirror state (if necessary), and propagates the change mirrormg logic for removing unnccded write entnes from 

request to the slave storage UDit(s). Each slave storage unit ^ntc mlcnt log, particularly using the "laz/' deletion 

updates its mirror configuration information and mirror state technique. The remote mirronng logic penodically tests 

accordingly ^^^^ write entry m the write intent log to determine whether 

In order to retrieve mirror attributes (specifically, a copy 05 ^= ^^^f^ ^^^^^ ^^^.^^^ ^'i^^ ^"fy 

of the mirror attributes for each image in the mirror), the 7"^"* "^""^^ ^^^"^^^ determined to 

administrator sends a request to any storage unit in the ^« unneeded. The write entry is considered to be needed if 

mirror Tlie receiving storage unit retrieves the mirror the remote mirronng logic is still m the process of updaUng 

attributes for its own image, and also retrieves the mirror "^^^^ ^°^^?f based upon the corresponding 

attributes for the other images in the mirror from the 30 wnte request, and is comidered to be unneeded if the remote 

respective storage units. The receiving storage unit returns a ^i^ronng logic has updated all mirror miages based upon 

copy of the mirror attributes for each image in the mirror (or corresponding wnte request. 

a set of error codes for any unretrievable image) to the Therefore, beginning in step 702, the remote mirroring 

administrator. logic determines whether there is an untested write entry in 

In order to destroy the mirror, the'mirror must consist of 35 ^'"^^^ "'^^^^ ^^S' ^^^P ^ ^^^^ 

only the master image, and the mirror must be in the write intent log have been tested (NO in step 704), then the 

INACTIVE state (402). Thus, in order to destroy the mirror, mirronng logic for removmg unneeded wnte entries 

the administrator first removes all slave images from the ^o^n the wnte mtent log teraimates m step 799. However, if 

mirror and then deactivates the mirror, as described above. there is an untested write entry in the wnte mtent log (YES 

The administrator then instnicts the master storage unit to 40 ^^^P ^^)' ^^^^^^ minoring logic proceeds to 

destroy the mirror. The master storage unit removes all determme whether the wnte entry is still needed, m step 706. 

mirror configuration infonnation associated with the mirror, ^nte entry is still needed (YES m step 706), then the 

and removes all LUs associated with the mirror from the LU remote minoring logic recycles to step 704. However, if the 

List. As a result, the remote mirroring driver in the master ^nte entry is unneeded (NO m step 706), then the remote 

storage unit stops intercepting write requests that are tar- 45 mirroring logic removes the write entry from the write intent 

geted for the LUs in the LU Array Set. It should be noted that ■ ^^g, m step 708, and recycles to step 704. 

destroying the minor does not delete the conesponding LUs Du ring-ope ration of the minor, it is possible for the 

or the data contained therein. master image to fail. The master image can fail due to a SP 

JDuring operation of the minor, the primary SP in the failure, a communication failure, or a media-failure. When 

master storage imit, and particularly the remote minori ng so the master image fails, the minor cannot be accessed until 

l ogic 302, maintains the write intent log in the write cach e. either the master image is repaired or a slave image is 

Mamtaining the write intent log involves stoHng write promoted to operate as a master image as described above. 
entries in the write intent log and removing unneeHe^^rit e^ Furthermore, failxu-e of the master while a slave synchroni- 

e ntries from the write intent lo g, _and_mav_also involve zation operation is taking place leaves the slave's state 

jtoring the write intent lo g .in a nonvolatile stora ge ( suclTa s 55 unchanged from what it was when the synchronization 

the Disk Anay 206) upon de tecting a failure and restoring operation started. Once a master is established (either by 

the wntg"'intent"lo g"fi'OM Lhi non-volaiilfe storage upo n repairing the cunent master or promoting a slave to operate 

recovery from the failu re, particularly if the writ e intent log as a master), the synchronization operation is restarted, 

is kept in a volatile storage such as a Kanaom Acces s As mentioned above, an SP failure can cause a master 

Memory (ka mk tiach write entry includes meta-data 60 image failure. An SP failure in a master storage imit that has 

"derived trom a wnte request, and preferably identifies a a single iSP results in an outright failure of the master image, 

particular image block that is being updated by the remote However, failure of one SP in a master storage unit that has 

minoring logic 302. So long as a particular write entry is in two SPs does not prevent the master storage unit from 

the write intent log, the remote minoring logic 302 considers operadng in the minor, since the remaining SP is able to 

the corresponding image block to be potentially unsynchro- 65 assume management and control of the master image so that 

nized across all minor images. Once all minor images are the minor can continue operating as usual (but without the 

updated based upon a particular write request, the remote security of a badcup SP). 
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Therefore, when the primary SP in the master storage unit the write entries in the write intent log, in step 908. The logic 

fails, the secondary SP assumes control of the master image. terminates in step 999. to Certain aspects of the present 

At the time of the failure, the write operations corresponding invention can be demonstrated by example. In this example, 

to any write entries in the write intent log may be at different the master storage unit receives a first write request to write 

points of completion. For example, the remote mirroring 5 to a first image block (Block 1), a second write request to 

logic 302 may not have updated any image, may have write to a second image block (Block 2), a third write request 

updated the master image but no slave image, may have to write to a third image block (Block 3), and a fourth write 

updated the master image and some of the slave images, or request to write to a fourth image block (Block 4). The 

may have updated the master image and all of the slave master storage unit adds write entries into the write intent 

images for any particular write operaUon. However, because log indicating that Block 1, Block 2, Block 3, and Block 4 

the write intent log firom the primary SP is replicated on the may be unsynchronized, as depicted in FIG. lOA, and then 

secondary SP, the secondary SP is able to resynchronize the proceeds to update the mirror images based upon the write 

slave images using the replicated write intent log. requests. Before a failure occurs, the master storage unit is 

Specifically, rather than copying the entire master image to able to update all of the muror images based upon the first 

each of the slave storage units, the remote mirroring logic w^te request and delete the corresponding write entry (i.e., 

determines any portions of the slave images that may be Block 1) from the write intent log, update all of the mirror 

unsynchronized based upon the write entries in the write images based upon the second write request but not delete 

intent log, and then resynchronizes only those portions of the the corresponding write entry from the write intent log, 

slave images that may be unsynchronized, preferably by update some (but not all) of the mirror images based upon 

copying only those image blocks that may be unsynchro- ^ the third write request, and update none oftheminror images 

oized. based upon the fourth write request. Thus, as shown in FIG, 

On the other hand, if die master storage unit fails, the lOB, the write intent log includes write entries for Block 2, 

automatic backup/rcstoral logic 306 automatically stores the Block 3, and Block 4 at the time of the failure. It should be 

write intent log in the Disk Array 206. FIG. 8 is a logic flow noted that, at the time of the failure, only Block 3 is actually 

diagram showing exemplary automatic backup/restoral ^5 unsynchronized, since the master storage unit has completed 

logic. Beginning in step 802, and upon detecting a SP updating Block 2 for all minor images and has not updated 

failure, in step 804, the automatic backup/restoral logic 306 Block 4 in any of the mirror images. Upon recovery from the 

stores the write intent log in the Disk Array 206, in step 806, failure, the write intent log is as shown in-FlG. lOB, since 

and terminates in step 899. In a prefeaed embodiment of the the write intent log is preserved through the failure. Thus, 

present invention, the master storage unit includes battery ^ the master storage unit copies Block 2, Block 3, and Block 

backup capabUities, allowing the automatic backup/restoral 4, as indicated in the write intent log, from the master image 

logic 306 to store the write intent log in the Disk Anay 206 to all of the slave storage units for resynchronizing the slave 

even in the case of a power failure. Furthermore, the images, 

automatic backup/restoral logic 306 actually stores multiple jn a preferred embodiment of the present invention, 

copies of the write intent log in the Disk Array 206 so that 35 predominantly all of the logic for maintaining the vsnrite 

the write intent log can be recovered in case of a partial disk intent Itfg and utilizing the write intent log to resynchronize 

failure. the slave images following a failure in the master storage 

At the time of the failure, the write operations correspond- unit is implemented as a set of computer program instruc- 

ing to any write entries in the write intent log may be at tions that are stored in a computer readable medium and 

different points of completion. For example, the remote 40 executed by an embedded microprocessor system within the 

mirroring logic 302 may not have updated any image, may master storage unit, and more particularly within a storage 

have updated the master image but no slave image, may processor running in the master storage unit. Preferred 

have updated the master image and some of the slave embodiments of the invention may be implemented in any 

images, or may have updated the master image and all of the convenrional computer programming language. For 

slave images for any particular v/rite operation. 45 example, to preferred embodiments may be implemented in 

Once the master storage unit is operation following the a procedural programming language (e.g., "C) or an object 

failure, the primary SP (which may be either the primary SP oriented programming language (e.g., "C++"). Alternative 

or the secondary SP fi*om prior to the failure), and partial- embodiments of the invention may be implemented using 

larly the automatic backup/restoral logic 306, restores the discrete components, integrated circuitry, programmable 

write intent log from the Disk Array 206. The remote 50 logic used in conjunction with a programmable logic device 

mirroring logic 302 may then be instructed to resynchronize such as a Field Programmable Gate Array (FPGA) or 

the slave images. Rather than copying the entire master microprocessor, or any other means including any combi- 

imagc to each of the slave storage units, the remote mirror- nation thereof, 

ing logic determines any portions of the slave images that Alternative embodiments of the invention may be imple- 

may be unsynchronized based upon the write entries in the ss mented as a computer program product for use w^ith a 

write intent log, and then resynchronizes only those portions computer system. Such implementation may include a series 

of the slave images that may be unsynchronized, preferably of computer instructions fixed either on a tangible medium, 

by copying only those image blocks that may be unsynchro- such as a computer readable media (e.g., a diskette, 

nizcd. CD-ROM, ROM, or fixed disk), or fixed in a computer data 

FIG. 9 is a logic flow diagram showing exemplary logic 60 signal embodied in a carrier wave that is transmittable to a 

for resynchronizing the slave images following a failure in computer system via a modem or other interface device, 

the master storage unit. Beginning in step 902, the logic first such as a communications adapter connected to a network 

restores the write intent log from the Disk Array 206, in step over a medium. The medium may be either a tangible 

904. Hie logic then determines any portions of the slave medium (e.g., optical or analog communications fines) or a 

images that may be unsynchronized as indicated by the write 65 medium implemented with wireless techniques (e.g., 

entries in the write intent log, in step 906, and resynchro- microwave, infrared or other transmission techniques). The 

nizcs only those portions of the slave images indicated by series of computer instructions embodies all or part of the 



12/31/2003, EAST Version: 1.4.1 



us 6,671,705 Bl 

19 20 

functionality previously described herein with respect to the the write request. The information derived from the write 

system. Those skilled in the art should appreciate that such request may be a block identifier identifying an image block 

computer instructions can be written in a number of pro- that may be unsynchronized or write update information 

gramming languages for use with many computer architec- indicating one or more modifications to the plurality of data 

tuies or operating systems. Furthermore, such instructions 5 images. The remote mirroring logic also includes master 

may be stored in any memory device, such as image updating logic for updating the master image based 

semiconductor, magnetic, optical or other memory devices, upon the write request and slave image updating logic for 

and may be transmitted using any communications updating the at least one slave image based upon the write 

technology, such as optical, infi-ared, microwave, or other request. The log maintenance logic removes the write entry 

transmission technologies. It is expected that such a com- jq from the write intent log after updating the master image and 

puter program product may be distributed as a removable the at least one slave image based upon the write request, 

medium with accompanying printed or electronic documen- preferably using a "laz/' deletion technique. The apparatus 

tation (e.g., shrink wrapped software), preloaded with a may also include automatic backup/restoral logic for storing 

computer system (e.g., on system ROM or fixed disk), or the write intent log in the non-volatile storage upon detect- 

distributed from a server or electronic bulletin board over 15 ing a failure and restoring the write intent log from the 

the network (e.g., the Intemet or World Wide Web). non-volatile storage upon recovery from the failure. The 

Thus, the present invention may be embodied as a method resynchronization logic copies only those portions of the 

for synchronizing a plurality of data images in a computer master image that may be unsynchronized to the at least one 

system. The plurality of data images include a master image slave image. In a preferred embodiment of the present 

and at least one slave image. The method involves main- 20 invention, the write intent log identifies a number of image 

taining a log identifying any portions of the plurality of data blocks that may be unsynchronized, in which case the 

images that may be unsynchronized and resynchronizing resynchronization logic copies only those image blocks that 

only those portions of the plurality of data images that may may be unsynchronized from the master image to the at least 

be unsynchronized. Maintaining the log involves receiving one slave image. 

a write request and storing in the log a write entry compris- 25 The present invention may also be embodied in computer 
ing information derived from the write request. The infor- program for maintaining a plurality of data images in a 
mation derived from the write request may be a block computer system. The plurality of data images include a 
identifier identifying an image block that may be unsyn- master image and at least one slave image. The computer 
chronized or write update information indicating one or program includes disk management logic providing an inter- 
more modifications to the plurality of data images. Main- 30 face to a non-volatile storage for storing at least the master 
taining the log also involves updating the master image image and to a network interface for accessing the at least 
based upon the write request, updating the at least one slave one slave image and remote mirroring logic for maintaining 
image based upon the write request, and removing the write the pluraUty of data images. The remote mirroring logic 
entry from the log after updating the master image and the includes log maintenance logic programmed to maintain a 
at least one slave image based upon the write request. 35 write intent log indicating any portions of tht at least one 
Maintaining the log may also involve writing the log to a slave image that may be unsynchronized and resynchroni- 
non-volatile storage upon detecting a failure and restoring zation logic programmed to resynchrooize the at least one 
the log from the non-volatile storage upon recovery from the slave image to the master image following a failure by 
failure, Resynchronizing only those portions of tiie plurality resynchronizing only those portions of the at least one slave 
of data images that may be unsynchronized involves copy- 40 image tiiat may be unsynchronized as indicated by the write 
ing only those portions of the master image to the at least one intent log. The remote mirroring logic also includes receiv- 
slave image. In a preferred embodiment of the present ing logic operably coupled to receive a write request from a 
invention, the log identifies a number of image blocks that host, in which case the log maintenance logic is programmed 
may be unsynchronized, in which case resynchronizing only to store in the write intent log a write entry including 
those portions of tiie plurality of data images that may be 45 mformation derived from the write request. The information 
unsynchronized involves copying only those image blocks derived from the write request may be a block identifier 
that may be unsynchronized from the master image to the at identifying an image block that may be unsynchronized or 
least one slave image. write update information indicating one or more modifica- 
The present invention may also be embodied as an tions to the plurality of data images. The remote mirroring 
apparatus for maintaining a plurality of data images in a 50 logic also includes master image updating logic for updating 
computer system. The plurality of data images include a the master image based upon the write request and slave 
master image and at least one slave image. The apparatus image updating logic for updating the at least one slave 
includes at least a non-volatile storage for storing at least the image based upon the write request. The log maintenance 
master image, a network interface for accessing the at least logic is programmed to remove the write entry from the 
one slave image, a write intent log for indicating any 55 write intent log after updating the master image and the at 
portions of the at least one slave image that may be least one slave image based upon the write request, prefer- 
unsynchronized, and remote mirroring logic for maintaining ably using a "lazy" deletion technique. The computer pro- 
the pluraUty of data images. The remote mirroring logic gram may also include automatic badcup/restoral logic for 
includes, among other things, resynchronization logic for storing the write intent log in the non-volatile storage via the 
resynchronizing the at least one slave image to the master 60 disk management logic upon detecting a failure and restor- 
image following a failure by resynchronizing only those ing the write intent log from the non-volatile storage via the 
portions of the at least one slave image that may be unsyn- disk management logic upon recovery from the failure. The 
chronized as indicated by the write intent log. The remote resynchronization logic copies only those portions of the 
mirroring logic also includes receiving logic operably master image that may be unsynchronized to the at least one 
coupled to receive a write request from a host and log 65 slave image via the disk management logic. In a preferred 
maintenance logic operably coupled to store in the write embodiment of the present invention, the write intent log 
intent log a write entry including information derived from identifies a number of image blocks that may be 
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unsynchronized, in which case the resynchronization logic 
copies only those image blocks that may be unsynchronized 
from the master image to the at least one slave image via the 
disk management logic. 

Hie present invention may also be embodied as a com- 
puter system having a master storage unit for maintaining a 
master image and at least one slave storage unit for main- 
taining a slave image. The master storage imit maintains a 
log identifying any portions of the slave image that may be 
unsynchronized, and copies from the master image to the at 
least one slave storage unit only those portions of the master 
image identified in the log. The slave storage Mnii updates 
the slave image to include only those portions of the master 
image copied from the master image in order to synchronize 
the slave image to the master image. 

The present invention may be embodied in other specific 
forms without departing from the essence or essential char- 
acteristics.. The described embodiments are to be considered 
in all respects only as illustrative and not restrictive. 

We claim: 

1. A method for synchronizing a plurality of data images 
in a computer system, the computer system comprising a 
master storage unit for maintaining a master image and at 
least one slave storage unit for maintaining at least one slave 
image, the method comprising: 

receiving a number of write requests identifying portions 
of the plurality of data images to be written to the 
master image and to the at least one slave image; 

maintaining a log including a number of write entries 
identifying said portions of the plurality of data images; 

subsequent to storing the write entries in the log, initiating 
write operations to the master image and to the at least 
one slave image for updating the identified portions of 
the plurality of data images such that the write opera- 
tions can complete at different times and the v/rite 
operation to the at least one slave image can complete 
prior to the write operation to the master image; 

determining that there was a failure that may have caused 
said portions of the plurality of data images to become 
unsynchronized; and 

copying said portions of the plurality of data images 
identified in the log from the master image to the at 
least one slave image following recovery from said 
failure, wherein: 

said copying overwrites a portion of the at least one 
slave image that had been updated with a portion of 
the master image that had not been updated, if a write 
operation had completed in the at least one slave 
image but not in the master image; 

said copying overwrites a portion of the at least one 
slave image that had not been updated with a portion 
of the master image that had not been updated, if a 
write operation had completed in neither the at least 
one slave image nor the master image; 

said copying overwrites a portion of the at least one 
slave image that had been updated with a portion of 
the master image that had been updated, if a write 
operation had completed in both the at least one 
slave image and the master image; and 

said copying overwrites a portion of the at least one 
slave image that had not been updated with a portion 
of the master image that had been updated, if a write 
operation had completed in the master image but not 
in the at least one slave image. 

2. The method of claim 1, wherein maintaining the log 
further comprises removing unnecded write entries from the 
log. 
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3. The method of claim 2, wherein removing the unneeded 
write entries from the log following synchronization of the 
plurality of data images comprises: 

determining a convenient time to delete the unneeded 

write entries; and 
deleting the unneeded write entries fi-om the log at said 

convenient time. 

4. An apparatus for synchronizing a plurality of data 
images in a computer system, the plurality of data images 
including a master image and at least one slave image, the 
apparatus comprising: 

log maintenance logic operably coupled to receive a 
number of write requests identifying portions of the 
plurality of data images to be written to the master 
image and to the at least one slave image and to 
maintain a log including a number of write entries 
identifying said portions of the plurality of data images; 

storage logic operably coupled to initiate write operations 
to the master image and to the at least one slave image 
for updating the identified portions of the plurality of 
data images such that the write operations can complete 
at different times and the write operation to the at least 
one slave image can complete prior to the write opera- 
tion to the master image; 

failure detection logic operably coupled to determine that 
there was a failure that may have caused said portions 
of the plurality of data images to become \msynchro- 
nized; and 

synchronization logic operably coupled to copy said por- 
tions of the plurality of data images identified in the log 
from the master image to the at least one slave image 
following recovery from said failure, wherein: 

said copying overwrites apportion of the at least one slave 
image that had been updated with a portion of the 
master image that had not been updated, if a write 
operation had completed in the at least one slave image 
but not in the master image, 

said copying overwrites a portion of the at least one slave 
image that had not been updated with a portion of the 
master image that had not been updated, if a write 
operation had completed in neither the at least one 
slave image nor the master image; 

said copying overwrites a portion of the at least one slave 
image that had been updated with a portion of the 
master image that had been updated if a write operation 
had completed in both the at least one slave image and 
the master image; and 

said copying overwrites a portion of the at least one slave 
image that had not been updated with a portion of the 
master image that had been updated, if a write opera- 
tion had completed in the master image but not in the 
at least one slave image. 

5. The apparatus of claim 4, wherein the log maintenance 
logic is operably coupled to remove unneeded write entries 
from the log. 

6. The apparatus of claim 5, wherein the log maintenance 
logic is operably coupled to determine a convenient time to 
delete the unneeded write entries and delete the unneeded 
write entries from the log at said convenient time. 

7. A program product comprising a computer readable 
medium having embodied therein a computer program for 
synchronizing a plurality of data images in a computer 
system, the plurality of data images including a master 
image and at least one slave image, the apparatus compris- 
ing: 
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log main te nance logic programmed to receive a number 
of write requests identifying portions of the plurality of 
data images to be written to the master image and to the 
at least one slave image and to maintain a log including 
a number of write entries identifying said portions of 
the plurahty of data images; 

storage logic programmed to initiate write operations to 
the master image and to the at least one slave image for 
updating the identified portions of the plurality of data 
images such that the write operations can complete at 
different times and the write operation to the at least 
one slave image can complete prior to the write opera- 
tion to the master image; 

failure detection logic programmed to determine that 
there was a failure that may have caused said portions 
of the plurality of data images to become unsynchro- 
nized; and 

synchronization logic programmed to copy said portions 
of the plurality of data images identified in the log from 
the master image to the at least one slave image 
following recovery from said failure, wherein: 
said copying overwrites a portion of the at least one 
slave image that had been updated with a portion of 
the master image that had not been updated, if a write 
operation had completed in the at least one slave 
image but not in the master image; 
said copying overwrites a portion of the at least one 
slave image that had not been updated with a portion 
of the master image that had not been updated, if a 
write operation had completed in neither the at least 
one slave image nor the master image; 
said copying overwrites a portion of the at least one 
slave image that had been updated with a portion of 
the master image that had been updated, if a write 
operation had completed in both the at least one 
slave image and the master image; and 
said copying overwrites a portion of the at least one 
slave image that had not been updated with a portion 
of the master image that had been updated, if a write 
operation had completed in the master image but not 
in the at least one slave image. 

8. The program product of claim 7, wherein the log 
maintenance logic is programmed to remove unneeded write 
entries from the log. 

9. The program product of claim 8, wherein the log 
maintenance logic is programmed to determine a convenient 
time to delete the write entries and delete the write entries 
from the log at said convenient time. 

10. A computer system comprising a master storage unit 
for maintaining a master image and at least one slave storage 
unit for maintaining at least one slave image, wherein: 

the master storage unit is operably coupled to receive a 
number of write requests identifying portions of the 
plurality of data images to be written to the master 
image and to the at least one slave image; maintain a 
log including a number of write entries identifying said 
portions of the plurality of data images; initiate write 
operations to the master image and to the at least one 
slave image for updating the identified portions of the 
plurality of data images such that the write operations 
can complete at different times and the write operation 
to the at least one slave image can complete prior to the 
write operation to the master image; determine that 
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there was a failure that may have caused said portions 
of the plurality of data images to become unsynchro- 
nized; and copy those portions of the master image 
identified in the log to the at least one slave storage unit 
following recovery from said failiue, wherein: 
said copying overwrites a portion of the at least one 
slave image that had been updated with a portion of 
the master image that had not been updated, if a write 
operation had completed in the at least one slave 
' image but not in the master image; 

said copying overwrites a portion of the at least one 
slave image that had not been updated with a portion 
of the master image that had not been updated, if a 
. write operation had completed in neither the at least 

one slave image nor the master image; 
said copying overwrites a portion of the at least one 
slave image that had been updated with a portion of 
the master image that had been updated, if a write 
I operation had completed in both the at least one 

slave image and the master image; and 
said copying overwrites a portion of the at least one 
slave image that had not been updated with a portion 
of the master image that had been updated, if a write 
; operation had completed in the master image but not 

in the at least one slave image; and 
the at least one slave storage unit is operably coupled 
to update those portions of the at least one slave 
image copied from the master image in order to 
I synchronize the at least one slave image with the 

master image. 

11. A method for synchronizing a plurality of data images 
in a computer system, the computer system comprising a 
master storage unit for maintaining a master image and at 
35 least one slave storage unit for maintaining at least one slave 
image, the method comprising: 

receiving a number of write requests identifying portions 
of the plurality of data images to be written to the 
master image and to the at least one slave image; 
^ maintaining a log including a number of write entries 
identifying said portions of the plurality of data images; 
subsequent to storing the write entries in the log, initiating 
write operations to the master image and to the at least 
one slave image for updating the identified portions of 
the plurality of data images such that the write opera- 
tions can complete at different times and the write 
operation to the at least one slave image can complete 
prior to the write operation to the master image; 
determining that there was a failure that may have caused 
said portions of the plurality of data images to become 
unsynchronized; and 

copying said portions of the plurality of data images 
identified in the log from the master image to the at 

55 least one slave image following recovery from said 
failure without first checking if a portion of the slave 
image is different than a corresponding portion of the 
master image, wherein said copying overwrites a por- 
tion of the at least one slave image that had been 

6Q updated with a portion of the master image that had not 
been updated, if a write operation had completed in the 
at least one slave image but not in the master image. 

* « * * * 
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